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Abstract 

We address the denoising of images contaminated with multiplicative noise, e.g. speckle noise. 
Classical ways to solve such problems are filtering, statistical (Bayesian) methods, variational meth- 
ods, and methods that convert the multiplicative noise into additive noise (using a logarithmic func- 
tion), shrinkage of the coefficients of the log-image data in a wavelet basis or in a frame, and transform 
back the result using an exponential function. 

We propose a method composed of several stages: we use the log- image data and apply a reason- 
able under-optimal hard-thresholding on its curvelet transform; then we apply a variational method 
where we minimize a specialized criterion composed of an data-fitting to the thresholded coeffi- 
cients and a Total Variation regularization (TV) term in the image domain; the restored image is an 
exponential of the obtained minimizer, weighted in a way that the mean of the original image is pre- 
served. Our restored images combine the advantages of shrinkage and variational methods and avoid 
their main drawbacks. For the minimization stage, we propose a properly adapted fast minimization 
scheme based on Douglas- Rachford splitting. The existence of a minimizer of our specialized criterion 
being proven, we demonstrate the convergence of the minimization scheme. The obtained numerical 
results outperform the main alternative methods. 

1 Introduction 

In various active imaging systems, such as synthetic aperture radar, laser or ultrasound imaging, the data 
representing the underlying (unknown image) So - ^ ^ R+, 1^ C R^, are corrupted with multiplicative 
noise. It is well known that such a noise severely degrades the image (see Fig. 2(a)). In order to 
increase the chance of restoring a cleaner image, several independent measurements for the same image 
are realized, thus yielding a set of data: 

Sk = SoVk^ Uk, VA; G {1, • • • , K}, (1) 

where r]k - ^ ^ R+, and Uk represent the multiplicative and the additive noise relevant to each measure- 
ment k. Usually, Uk is white Gaussian noise. A commonly used and realistic model for the distribution 
of r]k is the one-sided exponential distribution: 



the latter is plotted in Fig. 1(a). Let us remind that is both the mean and the standard deviation of 
this distribution. The usual practice is to take an average of the set of all measurements — such an image 

1 ^ 

can be seen in (see Fig. 2(b)). Noticing that — ^^^fe ^ 0, the data production model reads 

k=i 

K K 

^=:^E^^ = ^o^E^^ = ^or/, (2) 

see e.g. [4,60,64] and many other references. A reasonable assumption is that all rjk are independent and 
share the same mean ji. Then the resultant mean of the multiplicative noise r] in (2) is known to follow 
a Gamma distribution, 

where F is the usual Gamma- function and since K is integer, F(i^) = [K — 1)!. Its mean is again fi and 
its standard deviation is ji/K. It is shown in Fig. 1(b). 

Various adaptive filters for the restoration of images contaminated with multiplicative noise have been 
proposed in the past, e.g. see [37,65] and the numerous references therein. It can already been seen that 
filtering methods work well basically when the noise is moderate or weak, i.e. when K is large. Bayesian 
or variational methods have been proposed as well; one can consult for instance [10,36,52,62] and the 
references cited therein. 

A large variety of methods — see e.g. [5,34], more references are given in § 1.1 — rely on the conversion 
of the multiplicative noise into additive noise using 

V = log S = log 5*0 + log T] = uq -\- n. (4) 

In this case the probability density function of n reads (see Fig. 1(c)): 

1 



n = logr]: pdf(n) = J ^^^^ exp K {n - /le'') . (5) 
One can prove that 

E[n] = MK)-\ogK, (6) 
Var[n] = ViW, (7) 

where 



Mz) = [^) logr(^) (8) 

is the polygamma function [2]. 

Classical SAR modeling — see [59,60] and many other references — correspond to /i = 1 in (3). Then 
(3) and (5) boil down to 

j^K^K-l^-Kv 



Pdf(r?) 



{K-l)\ 



r^K K{n-e") 

Pdf(n) = -jK^rjT- 
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Figure 1: Noise distributions. 



1.1 Multiscale shrinkage for the log-data 

Many authors — see [4,6,34,49,64] and the references given there — focus on restoring the log-data as given 



where W is the corresponding frame analysis operator, i.e. = (v^Wi)^ \/i G /. The rationale is 

that the noise Wn in y is nearly Gaussian — as seen in Fig. 1(d) — and justified by the Central Limit 
Theorem. The obtained coefficients y have been considered in different frameworks in the literature. In a 
general way, coefficients are restored using shrinkage estimators using a symmetric function T : R ^ R, 
thus yielding 



Following [27], various shrinkage estimators T have been explored in the literature, [7,13,28,43,55,63]; 
see § 2.1 for more details on shrinkage methods. Shrinkage functions specially designed for multiplicative 
noise were proposed e.g. in [4,6,64]. 

Let W be a left inverse of W, giving rise to the dual frame {wi : i e I}. Then a denoised log-image 
vr is generated by expanding the shrunk coefficients yr in the dual frame: 



Then the sought-after image is of the form St = exp'Ur- 

1.2 Our approach and organization of the paper 

We first apply (4) and then consider a tight-frame transform of the log-data. Our method to restore the 
log-image is presented in section 2. It is based on the minimization of a criterion composed of an £^-fitting 
to the (suboptimally) hard-thresholded frame coefficients and a Total Variation (TV) regularization in 
the image domain. This method uses some ideas from a previous work of some of the authors [29]. The 
minimization scheme to compute the log-restored image, explained in section 3, uses a Douglas- Rachford 
splitting specially adapted to our criterion. Restoring the sought-after image from the restored log- 
image requires a bias correction which is presented in section 4. The resultant algorithm to remove the 
multiplicative noise is provided in section 5. Various experiments are presented in section 6. Concluding 
remarks are given in section 7. 




yr[i]=T{{Wv)[i]), Vie I. 



(11) 




(12) 
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2 Restoration of the log-image 



In this section we consider how to restore a good log-image given data v : Q ^ K obtained according 
to (4). We focus basicahy on methods which, for a given preprocessed data set, lead to convex optimization 
problems. Below we comment only variational methods and shrinkage estimators since they underly the 
method proposed in this paper. 

2.1 Drawbacks of shrinkage restoration and variational methods 

Shrinkage restoration. The major problems with shrinkage denoising methods, as sketched in (11)- 
(12), is that shrinking large coefficients entails an erosion of the spiky image features, while shrinking 
small coefficients towards zero yields Gibbs-like oscillations in the vicinity of edges and a loss of texture 
information. On the other hand, if shrinkage is not sufficiently strong, some coefficients bearing mainly 
noise will remain almost unchanged — we call such coefficients outliers — and (12) suggests they generate 
artifacts with the shape of the functions Wi of the frame. A well instructive illustration can be seen in 
Fig. 2(b-h). Several improvements, such as translation invariant thresholding [22] and block thresholding 
[21], were brought to shrinkage methods in order to alleviate these artifacts. Results obtained using 
the latter method are presented in Figs. 3(c), 4(d) and 5(d) in Section 6. Another inherent difficulty 
comes from the fact that coefficients between different scales are not independent, as usually assumed, 
see e.g. [8,13,43,54]. 

Variational methods. In variational methods, the restored function is defined as the minimizer of a 
criterion J^y which balances trade-off between closeness to data and regularity constraints. 



where i/j : R+ R+ helps to measure closeness to data, V stands for gradient (possibly in a distributional 
sense), (p : R+ R+ is called a potential function and p > is a parameter. A classical choice for i/j 
is il)(u(t)^v{t)) = {u(t) — v{tyf' which assumes that the noise n in (4) is white, Gaussian and centered. 
Given the actual distribution of the noise in (9) and Fig. 1(c), this may seem hazardous; we reconsider 
this choice in (15). A reasonable choice is to use the log-likelihood of n according to (9) and this was 
involved in the criterion proposed in [36] — see (16) at the end of this paragraph. 

Let us come to the potential function Lp in the regularization term. In their pioneering work, Tikhonov 
and Arsenin [56] considered Lp(t) = t^] however it is well known that this choice for if leads to smooth 
images with fiattened edges. Based on a fine analysis of the minimizers of J^y as solutions of PDE's on 1], 
Rudin, Osher and Fatemi [53] exhibited that (p{\Vu{t)\) = \\\/u{t)\\2, where ||.||2 is the L^-norm, leads to 
images involving edges. The resultant regularization term is known as Total Variation (TV). However, 
whatever smooth data-fitting is chosen, this regularization yields images containing numerous constant 
regions (the well known stair-casing effect), so that textures and fine details are removed, see [46]. The 
method in [10] is of this kind and operates only on the image domain; the fitting term is derived from 
(3) and the criterion reads 




(13) 




(14) 
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where p depends on K. The denoised image Sq = argnhn^^ exhibit constant regions, as seen in Figs. 4(e) 
and 5(e) in Section 6. We also tried to first restore the log-image u by minimizing 

F,{u)=p\\u-v\\^^\\u\\tv (15) 

and the sought after image is of the form 5*0 = 5exp(?i) where B stands for the bias correction explained 
in section 4. Because of the exponential transform, there is no stair-casing, but some outliers remain 
visible — see Figs. 4(c) and 5(c); nevertheless, the overall result is very reasonable. The result of [53] 
was at the origin of a large amount of papers dedicated to constructing edge-preserving convex potential 
functions, see e.g. [3,20,61], and for a recent overview, [11]. Even though smoothness at the origin 
alleviates stair-casing, a systematic drawback of the images restored using all these functions ip is that 
the amplitude of edges is underestimated — see e.g. [47]. This is particularly annoying if the sought-after 
function has neat edges or spiky areas since the later are subjected to erosion. A very recent method 
proposed in [36] restores the discrete log-image using the log-likelihood of (9) and a regularized TV; more 
precisely, 

:fm w') = E («W + ^We""'^') + poh - + Plkllxv, (16) 

i 

where the denoised log-image u is obtained using alternate minimization on u and w. The TV term here 
is regularized via 111^ — 1^^112 ^1^^ resultant denoised image is given by = exp(i^)). The results present 
some improvement with respect to the method proposed in [10], at the expense of two regularization 
parameters {p and po) and twho stopping rules for each one of the minimization steps. 

2.2 Hybrid methods 

Hybrid methods [14,16,19,23,30,33,40,41] combine the information contained in the large coefficients 
y[i]^ obtained according to (10), with pertinent priors directly on the log-image u. 

Remark 1 Such a framework is particularly favorable in our case since the noise Wn[i], i e I in the 
coefficients y[i]^ i G I, have a nearly Gaussian distribution — see Fig. 1(d). 

Although based on different motives, hybrid methods amount to define the restored function u as 

minimize ^{u) 

subject to u ^ {u: \{W{u — v)) [i] \ < pi, Vi G /} . 

If the use of an edge-preserving regularization, such as TV for <l> is a pertinent choice, the strategy for the 
selection of parameters {pi}ieJ is more tricky. This choice must take into account the magnitude of the 
relevant data coefficient y[i]. However, deciding on the value of pi based solely on y[i]^ as done in these 
papers, is too rigid since there are either correct data coefficients that incur smoothing {pi > 0), or noisy 
coefficients that are left unchanged {pi =0). A way to alleviate this situation is to determine {pi)iei 
based both on the data and on a prior regularization term. Following [44,45], this objective is carried out 
by defining restored coefficients x to minimize the non-smooth objective function, as explained below. 
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2.3 A specialized hybrid criterion 

Given the log-data v obtained according to (4), we first apply a frame transform as in (10) to get 
y = Wv = Wuo + Wn. We systematically denote by x the denoised coefficients. The noise contained in 
the i-th datum reads (n, Wi) whose distribution is displayed in Fig. 1(d). The low frequency approximation 
coefficients carry important information about the image. In other words, when Wi is low frequency, then 
(n, Wi) has a better SNR than other coefficients. Therefore, as usual, a good choice is to keep them intact 
at this preprocessing stage. Let C / denote the subset of all such elements of the frame. Then we 
apply a hard-thresholding to all coefficients except those contained in 

^ThW ='^h(^M), ViG/\/*, (17) 
where the hard-thresholding operator 7h reads [27] 



The resultant set of coefficients is systematically denoted by t/Th • We choose an under optimal threshold 
T in order to preserve as much as possible the information relevant to edges and to textures, an important 
part of which is contained in the small coefficients. Let us consider 

^^H = ^ W^W^^i, (19) 
ieii 

where 

h = {ieI\h:\y\i]\>T}. (20) 

The image contains a lot of artifacts with the shape of the Wi for those y[i] that are noisy but above 
the threshold T, as well as a lot of information about the fine details in the original log-image uq. In all 
cases, whatever the choice of T, the image of the form vr^ is unsatisfactory — see Fig. 2 (b-h). 

We will restore x based on the under-thresholded data yr^ . We focus on hybrid methods of the form: 

X — argminF(x) 

X 

^ F{x) = ^^{x,yr^)^^{Wx), 

where ^ is a data-fitting term in the domain of the frame coefficients and <l> is an edge-preserving 
regularization term bearing the prior on the sought-after log-image u. The latter sought-after log-image 
u reads 

u = Wx . (22) 

Next we analyze the information content of the coefficients yr^ that give rise to our log-image u. Let 
us denote 

h = I\{hyJh)={ieI\h: \y[i]\ < T). (23) 
We are mostly interested by the information borne by the coefficients relevant to /q and /i. 

(/o) The coefficients y[i] for i £ Iq are usually high-frequency components which can be of the two types 
described below. 
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(e) T 



(f) T 



(g) T 



(h) T = 8cr 



Figure 2: (a) Noisy Lena ioi K = 1. (b) Noisy Lena obtained via averaging, see (1), for K = 10. (c)-(h) 
Denoising of data v shown in (c) where the curvelet trasform of v are hard-thresholded according to 
(17)-(19) for different choices of T where a = ^Jlp^[{K) is the standard deviation of the noise n. The 
displayed restorations correspond to exp'i;rH, see (19). 



(a) Coefficients y[i] containing essentiahy noise — in which case the best we can do is to keep them 
nuh, i.e. x[i] = y[i]; 

(b) Coefficients y[i] which correspond to edges and other details in uq. Since y[i] is difficult to 
distinguish from the noise, the relevant x[i] should be restored using the edge-preserving prior 
conveyed by <l>. Notice that a careful restoration must find a nonzero x[i], since otherwise 
x[i] =0 would generate Gibbs-like oscillations in u. 

(/i) The coefficients y[i] for i G h are of the following two types: 

(a) Large coefficients which carry the main features of the sought-after function. They verify 
y[i] ^ {wi^uo) and can be kept intact. 

(b) Coefficients which are highly contaminated by noise, characterized by \y[i]\ ^ \{wi^uo)\. We 
call them outliers because if we had x[i] = y[i]^ then u would contain an artifact with the 
shape of Wi since by (19) we get vr^i = + ^W^^i- Instead, x[i] must be restored 
according to the prior ^. 

This analysis clearly defines the goals that the minimizer x of Fy is expected to achieve. In particular, 
X must involve an implicit classification between coefficients that fit to t/Th exactly and coefficients that 
are restored according to the prior term <l>. In short, restored coefficients have to fit t/Th exactly if they 
are in accordance with the regularization term <l> and have to be restored via the later term otherwise. 



Since [44,45] we know that criteria Fy where ^ is non-smooth at the origin (e.g. i^) can satisfy : 
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for coefficients that are in accordance with the prior <l>, while the others coefficients are restored according 
to <l>, see also [29]. For these reasons, we focus on a criterion on the form 



Fy{x) = ^{x)^^{x) (24) 

where 

= ^A,|(x-yr„)W| = E Ai|(x-y)[i]| + ^A.|a:[i]|, (25) 

iei ieiiui^ ieio 

^x) = [ \VWx\ds = WWxWty. (26) 

Note that in (24), as well as in what follows, we write Fy in place of Fy^^ in order to simplify the 
notations. 

In the pre-processing step (18) we would not recommend the use of a shrinkage function other than 
7h since it will alter all the data coefficients yr^ without restoring them faithfully. In contrast, we base 
our restoration on data t/Th where all non-thresholded coefficients keep the original information on the 
sought-after image. 

The theorem stated next ensures the existence of a minimizer for Fy as defined in (24) and (25)- (26). 
Its proof can be found in [29]. 

Theorem 1 [29] For y G ^^(/) and T > given, consider Fy as defined in (24), where Q eK^ is open, 
bounded and its boundary is Lipschitz. Suppose that 

1. {wi}i^i is a frame of L'^{Vt) and the operator W is the pseudo-inverse ofW; 

2. Amin = min > 0. 
Then Fy has a minimizer in 

Let us remind that the minimizer of Fy is not necessarily unique. Given y, denote 

Gy "^'{xe f (/) : Fy {x) = ^min ^ Fy {x) } . (27) 

Hopefully, for every sample of the preprocessed data yr^ , the set Gy is convex and corresponds to images 
Wx which are very similar since they share the same level lines. The theorem below confirms this assertion 
and is proven in [29]. 

Theorem 2 [29] If xi and X2 are two minimizers of Fy (i.e. xi G Gy and X2 G Gy), then 

VWxi (X VWx2, a.e. on Vt. 

In other words, Wxi and Wx2 have the same level lines. 

In words, images Wxi and Wx2 are obtained one from another by a local change of contrast which 
is usually invisible for to the naked eye. 
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Some orientations for the choice of were investigated in [29]. If i G /i, the parameter should 
be close to, but below the upper bound Ili^iUxv^ since above this bound, the coefficients y[i] cannot be 
changed. For i G /q, a reasonable choice is 



\i = max 



[Vwi) ds 
\Vwk\ 



where stands for transposed. If A^ is below this bound, some neighboring outliers might not be 
properly removed although Gibbs oscillations are better reduced. Another important remark is that, for 
some multiscale transforms, the bounds discussed above are constant. In the proposed model, we use 
only two values for A^, depending only on the set the index i belongs to. 

We focus on the coefficients of a curvelets transforms of the log-data because (a) such a transform 
captures efficiently the main features of the data and (b) it is a tight-frame which is helpful for the 
subsequent numerical stage. 

3 Minimization for the log-image 

Let us rewrite the minimization problem defined in (24) and (25)- (26) in a more compact form: find x 
such that Fy{x) = min^; Fy for 

min^ = ^ + <!>, 



where [ ^ f 
1^ <l>(x 



x) = \\K{x-yr^)\\^, for A = diag(Ai)ie/, ^^^^ 

) = IIH^^IItv. 



where A^ are the coefficients given in (25). Clearly, ^ and ^ are proper lower-semicontinuous convex 
functions, hence the same holds true for Fy. The set Gy introduced in (27) is non-empty by Theorem 1 
and can be rewritten as 

where dFy stands for subdifferential operator. Minimizing Fy amounts to solving the inclusion 

G dFy{x) , 

or equivalently, to finding a solution to the fixed point equation 

x = {ld^-fdFy)-^{x) , (29) 

where (Id + jdFy)~^ is the resolvent operator associated to dFy^ 7 > is the proximal stepsize and Id 
is the identity map on the Hilbert space ^^(/). The proximal schematic algorithm resulting from (29), 
namely 

^(m) ^ {id^jdFy)-\x^'^), 

is a fundamental tool for finding the root of any maximal monotone operator [31,51], such as e.g. the 
subdifferential of a convex function. Since the resolvent operator {ld-\- 'ydFy)~^ for Fy in (28) cannot be 
calculated in closed-form, we focus on iterative methods. 
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Splitting methods do not attempt to evaluate the resolvent mapping (Id + 7(9^ + d^))~^ of the 
combined function Fy^ but instead perform a sequence of calculations involving separately the resolvent 
operators (Id + 79^)~^ and (Id + 79<I>))~^. The latter are usually easier to evaluate, and this holds true 
for our functionals ^ and <l> in (28). 

Splitting methods for monotone operators have numerous applications for convex optimization and 
monotone variational inequalities. Even though the literature is abundant, these can basically be system- 
atized into three main classes: the forward-backward [35,57,58], the Douglas/Peaceman-Rachford [39], 
and the little-used double-backward [38,48]. A recent theoretical overview of all these methods can be 
found in [24,32]. Forward-backward can be seen as a generalization of the classical gradient projection 
method for constrained convex optimization, hence it inherits all its restrictions. Typically, one must 
assume that either ^ or <I> is differentiable with Lipschitz continuous gradient, and the stepsizes 7 must 
fall in a range dictated by the gradient modulus of continuity; see [26] for an excellent account. Obviously, 
forward-backward splitting is not adapted to our functional (29). 

3.1 Specialized Douglas- Rachford splitting algorithm 

The Douglas/Peaceman-Rachford family is the most general preexisting class of monotone operator split- 
ting methods. Given a fixed scalar 7 > 0, let 

^7^^ =^ (Id + 7^^)"^ and J^q^ =^ (Id + 7^^)"^ (30) 

Given a sequence /i^ G (0, 2), this class of methods can be expressed via the recursion 



(1 - y) Id + y (2J^a^ - Id) o (2J^a^ - Id)] x^'^ . (31) 



Since our problem (28) admits solutions, the following result ensures that iteration (31) converges for 
our functional Fy. 

Theorem 3 Let 7 > and jit G (0,2) he such that ^t^^^ l-^t{'^ — l^t) = +00. Take x^^^ G ^^(/) and 
consider the sequence of iterates defined by (31). Then, {x^'^^)t^^ converges weakly to some point x G 
and Jjd^{x) G Gy. 

This statement is a straightforward consequence of [24, Corollary 5.2]. For instance, the sequence /it = 
l,Vt G N, satisfies the requirement of the latter theorem. 
It will be convenient to introduce the reflection operator 

rprox^ = 2prox^ — Id. (32) 

where prox^ is the proximity operator of (p according to in Definition 1. Using (35) and (32), the 
Douglas-Rachford iteration given in (31) becomes 



1 - y j Id + yrprox^^ o rprox^^ 



x^'^ . (33) 
Below we compute the resolvent operators J^d^ and J^q^ with the help of Moreau proximity operators. 
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3.2 Proximal calculus 

Proximity operators were inaugurated in [42] as a generalization of convex projection operators. 

Definition 1 (Moreau [42]) Let (p be a proper, lower-semicontinuous and convex function defined on 
a Hubert space H. Then, for every x ^ 7i, the function z (p{z) -\- \\x — z\f /2, for z ^ H, achieves its 
infimum at a unique point denoted by prox^x. The operator prox^ :H ^ H thus defined is the proximity 
operator of cp. 

By the minimality condition for prox^, it is straightforward that \/x^p G 7i we have 

p = prox^x <=> X — p e d(p{p) <^=^ (Id + d(p)~^ = prox^. (34) 

Then (30) reads 

Jjd^ = prox^^ and J^q^ = prox^^. (35) 
3.2.1 Proximity operator of ^ 

The proximity operator of 7^ is established in the lemma stated below. 
Lemma 1 Let x e Then 

prox^^(x) = (^^Th W + ^s^^' {x\i] - ^Th W)) .^^ , (36) 

with 

rs^^'{z[i]) = max{0, z[i] - -fX^sign{z[i])}. (37) 



Proof. ^ is an additive separable function in each coordinate i G L Thus, solving the proximal 
minimization problem of Definition 1 is also separable. For any convex function cp and v G ^^(/), put 
V^(.) = p{. -v). Then 

p = prox^(x) <^=^ X — p e dil){p) 

<^=^ {x — v) — {p — v) ^ dp{p — v) 
<^=^ p — V = prox^(x — v) 
<^=^ p = V -\- prox^(x — v) . 

For each i G /, we apply this result with v = ^ThW and p{z[i]) = jXi\z[i]\. Noticing that prox^ = Ts^'^* 
is soft-thresholding with threshold jXi, leads to (36). o 

Note that now 

rprox^^(x) = 2 (^Th W + Ts^^' {x[i] - ^Th W))^^^ - x . (38) 
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3.2.2 Proximity operator of <l> 

Clearly, ^{x) = || • ||tvo^(^) is a pre-composition of the TV- norm with the linear operator W. Computing 
the proximity operator of <l> for an arbitrary W may be intractable. We adopt the following assumptions: 

(wl) W : £'^{I) L^{n) is surjective; 

(w2) WW = Id and W = c~^W* for < c < oo, where W* stands for the adjoint operator; note that 
we also have = c Id; 

(w3) W is bounded. 

For any z{t) = {zi{t), Z2{t)) G R^ t G we set \z{t)\ = Zi{ty ^ Z2{ty . Let X = L'^{n) x L'^{n) and 
(•, ■)x be the inner product in A', and ||| • ||| , for p G [1, oo] the L^-norm on X. We define A") as the 
closed L°°-ball of radius 7 in X, 

{zeX: \\zl^ < 7} = = (zi,Z2) G X : \z{t)\ < 7,Vt G n], (39) 

and Pb^(x) ' ~^ ^oo(^) associated projector; it is easy to check that the latter is equal to the 
proximity operator of the indicator function of 5^ (A'). The expression for prox^.^, is given in the next 
lemma while the computation scheme to solve item (ii) is stated in Lemma 3. 

Lemma 2 Let x G ^^(/) and B^{X) is as defined above. 

(i) Denoting 6^ prox^-i^||.||^^(ii) the proximity operator of the (c~^ -scaled) TV-norm, we have 

prox^^(x) = (id - o (id - prox,-i^||.||^^) o w) {x) ; (40) 

(ii) Furthermore, 

prox^-i7lMlTvH = ^ - Pc{u) , (41) 

where 

C = {div(z) G L^{n)\z G C^{n xn),ze BZ^\X)y (42) 

Proof. Since W is surjective, its range is L'^{Q) which is closed. Moreover, the domain dom(|| • 
||tv) = L'^{^) as well, so that cone (dom|| • ||tv — range W^ = {0} which is a closed subspace of L'^{ft). 
Reminding that || • ||tv is lower bounded, continuous and convex, it is clear that all assumptions required 
in [25, Proposition 11] are satisfied. Applying the same proposition yields statement (i). 

We focus next on (ii). Note that C in (42) is a closed convex subset since 5^^^ (A') is closed and 
convex, and div is linear; thus the projection Pq is well defined. 

Let us remind that the Legendre-Fenchel (known also as the convex- conjugate) transform of a function 
(f : H ^ where H is an Hilbert space, is defined by 

ip''{w)= sup {{w,u) - ip{u)}, 

nGdom((/?) 
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and that is a closed convex function. If is convex, proper and lower semi-continuous, the original 
Moreau decomposition [42, Proposition 4. a] tells us that 

prox^ + prox^* = Id . (43) 

One can see also [26, Lemma 2.10] for an alternate proof. It is easy to check that the conjugate function 
of a norm is the indicator function i of the ball of its dual norm, see e.g. [9, Eq.(2.7)]; thus 



(c-M|.||tv)*W 



if z G C , 
+00 if z C , 



where C is given in (42). Using Definition 1, it is straightforward that 

prox/ _ Y = Pc. 

[C W||.||Tvj 

Identifying c~^7||.||tv with (p and (c~^7||.||tv)* with (/:?*, equation (43) leads to statement (ii). The proof 
is complete. o 

Note that our argument (43) for the computation of prox^-i^||.||^^(2i) is not used in [18], which instead 
uses conjugates and bi-conjugates of the objective function. 

Remark 2 In view of equations (41) and (42), we one can see that the term between the middle paren- 
theses in equation (40) admits a simpler form: 

Id-prox,-i^l|.|l^^ =Pc. 

Using (32) along with (40)- (41) we easily find that 

rprox^^(x) = (id - 2W o (id - prox^-i^||.||^^^ o {x) 

= (id -2WoPcoW^ (x) . (44) 

3.2.3 Calculation of the projection Pc in (41) in a discrete setting 

In what follows, we work in the discrete setting. We consider that that W is an M x N tight frame with 
M = #/ ^ admitting a constant c > such that 

WW = ld and W = c'^W^ (hence W^W = eld). 

(This is the discrete equivalent of assumption (w2).) We also suppose that W : ^^(/) £'^{ft) is surjective. 
Next we replace A' by its discrete counterpart, 

a: = f{n) xf{n) where Q is discrete with #n = N. (45) 

We denote the discrete gradient by V and consider Div : X £'^{Q) the discrete divergence defined by 
analogy with the continuous setting ^ as the adjoint of the gradient Div = —V*; see [18]. 



-•^More precisely, let u G be of size m x n, N = mn. We write (Vu)[i,j] = (ii[i + 1, j] — u[i,j], u[i,j + 1] — u[i,j]) 

with boundary conditions n[m + = u[m,i], \/i and u[i,n-\- 1] = u[i,n], Vi; then for z G A", we have (Div(z))[i, j] = 
(2:1 [z,j] — 2:1 [z — 1, j]) + (2;2[z,j] — 22 [i, J — 1]) along with 2i[0,i] = zi[m,i] = 22[i,0] = Z2[i,n] = 0, Vi. 
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Unfortunately, the projection in (41) does not admit an explicit form. The next lemma provides an 
iterative scheme to compute the proximal points introduced in Lemma 2. In this discrete setting, C in 
(42) admits a simpler expression: 

C = {mv{z) e £Hn) I z e B^^\X)}. , (46) 
where B^^^{X) is defined according to (39). 

Lemma 3 We adapt all assumptions of Lemma 2 to the new discrete setting, as explained above. Con- 
sider the forward-backward iteration 

^(^+1) = P;^^(^) (z^'^ + (Div(z(^)) - cu^y^ for < inf A < sup A < 1/4, (47) 
where G 

^ otherw^se. 

Then 

(i) {z^^'^)te'N converges to a point z G B^{X); 

(a) (c~^jDiY{z^*^)] converges to c~"^7Div(i) = (Id — prox^-i^||.||^^)(ii). 



Proof. Given u G ^^(^^), the projection w = Pc{u) defined by (41) and (46) is unique and satisfies 

2 



W 



arg min - \\u — w\\ 

wec 2 



arg mm < 





c 




—u — w 




7 



subject to w = Div(2;) for z G B^{P(!) 



t 



Div(i) where z = arg min 



—u — Div(z) 

7 



(48) 



This problem can be solved using a projected gradient method (which is a special instance of the forward- 
backward splitting scheme) whose iteration is given by (47). This iteration converges weakly to a mini- 
mizer of (48) — see [24, Corollary 6.5], provided that the stepsize Pt > satisfies sup^ < 2/(5^, where 6 
is the spectral norm of the div operator. It is easy to check that S'^ < 8 — see e.g. [18]. 
Set 

'(^^ = c"SDiv(z(^)),Vt G N and u; = c"SDiv(z). 



Thus, 



00^'^ - Co 



= Q)^||Div(^('))-Div(i) 
= (-V (Div(z(^)) - Div(i)) , z^'^ - z)x 



(49) 



where we use the fact that —V is the adjoint of Div. Let denote the gradient of a scalar-valued 
function of z, not to be confused with the discrete gradient operator V of an image. The gradient of the 
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function ^ ||cii/7 — Div(z)||^ with respect to z is IIci^/t — Div(z)||^^ = —V (Div(z) — cuj^). This 

relation together with the Schwarz inequahty apphed to (49) lead to 



H\2 



< |||VDiv(2(*))-VDiv(i)|||2|||2( 
= (^)^|||V (Div(2(*))-cV7) - V (Div((i) -cu/t)!!^!^^*) -f|||2 
= 0.5 (^)' |||D, (jcu/j - Div(2(*))|p') - D, (||cV7 - Div(5)f ) \U\\z^'^ - z\\\^ (50) 



From [24, Theorem 6.3], we deduce that the series 

^ |||D, (||cV7 - Div(.)ll) - (||cV7 - Div(.)||) {z 



2 
2 

teN 



is convergent. Inserting this property in (50) and using the fact that the sequence (z'^*^)tGN is bounded 
(as it converges weakly with |||i|||2 < liiiiinft^cx) |||^^^^ |||2)5 follows that o;^^^ converges strongly to cu. 
This completes the proof. o 

The forward-backward splitting-based iteration proposed in (47) to compute the proximity operator 
of the TV-norm is new and different from the projection algorithm of Chambolle [18], even tough the 
two algorithms bear some similarities. The forward-backward splitting allows to derive a sharper upper- 
bound on the stepsize /3t than the one proposed in [18] — actually twice as large. Let us remind that it 
was observed in [18] that the bound 1/4 still works in practice. Here we prove why thus is really true. 

3.3 Comments on the Douglas-Rachford scheme for Fy 

A crucial property of the Douglas-Rachford splitting scheme (33) is its robustness to numerical errors 
that may occur when computing the proximity operators prox^ and prox^ , see [24] . We have deliberately 
omitted this property in (33) for the sake of simplicity. This robustness property has important conse- 
quences: e.g. it allows us to run the forward-backward sub-recursion (47) only a few iterations to compute 
an approximate of the TV- norm proximity operator in the inner iterations, and the Douglas-Rachford 
is still guaranteed to converge provided that these numerical errors are under control. More precisely, 
let at G be an error term that models the inexact computation of prox^,^ in (40), as the latter is 

obtained through (47). If the sequence of error terms {at)^^-^ and stepsizes (/i^)^^js^ defined in Theorem 3 
obey Xl^^jv^ l^t \\cit\\ < +oo, then the Douglas-Rachford algorithm (33) converges weakly [24, Corollary 6.2]. 
In our case, using 200 inner iterations in (47) was sufficient to satisfy this requirement. 

Remark 3 Owing to the splitting framework and proximal calculus, we have shown in Lemma 2 that 
the bottleneck of the minimization algorithm is in the computation of the proximity-operator of the TV- 
norm. In fact, computing proxy. amounts to solving a discrete ROF-denoising. Our forward-backward 
iteration is one possibility among others, and other algorithms beside [18] have been proposed to solve 
the discrete ROF-denoising problem. While this paper was submitted, our attention was drawn to an 
independent work of [12] who, using a different framework, derive an iteration similar to (47) to solve 
the ROF. Another parallel work of [66] propose an application of gradient projection to solving the dual 
problem (48). We are of course aware of max-flow/min-cut type algorithms, for instance the one in [17]. 
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We have compared our whole denoising procedure using our implementation of proxy. and the max- 
flow based implementation that we adapted from the code available at [1]. We obtained similar results, 
although the max- flow-based algorithm was faster, mainly because it uses the £^ approximation of the 
discrete gradient, namely {\/u)[i^j] = + 1, j] — j]| + j + 1] — Let us remind that 

this approximation for the discrete gradient does not inherits the rotational invariance property of the 
norm of the usual gradient. 

4 Bias correction to recover the sought-after image 

Recall from (4) that uq = log 6*0 and set u = Wx'^^^^^ as the estimator of uq^ where A^dr is the number of 
Douglas- Rachford iterations in (33). Unfortunately, the estimator u is prone to bias, i.e. E [u] = uo — bu. 
A problem that classically arises in statistical estimation is how to correct such a bias. More importantly 
is how this bias affects the estimate after applying the inverse transformation, here the exponential. Our 



goal is then to ensure that for the estimate S of the image, we have E 
neighborhood of E [u] , we have 



= So- Expanding S in the 



E[expu] = exp(E[7i])(l + Var[^]/2 + i?2) 

= 5oexp(-6^)(l+Var[7i]/2 + i?2) , (51) 

where R2 is expectation of the Lagrange remainder in the Taylor series. One can observe that the posterior 
distribution of u is nearly symmetric, in which case R2 ^ 0. That is, bu ^ log(l + Var [u] /2) to ensure 
unbiasedness. Consequently, finite sample (nearly) unbiased estimates of uq and So are respectively 
u + log(l + Var [u] /2), and exp (u) {1 -\- Var [u] /2). Var [u] can be reasonably estimated by iIji{K), the 
variance of the noise n in (4) being given in (7). Thus, given the restored log-image our restored image 
read: 

S = exp{u){l^MK)/2) . (52) 

The authors of [64] propose a direct estimate of the bias bu using the obvious argument that the 
noise n in the log-transformed image has a non-zero mean ipo{K) — logi^. A quick study shows that the 
functions (1 + ipi{K)/2) and exp(logi^ — tlJo{K)) are very close for K reasonably large. Thus, the two 
bias corrections are equivalent. Even though the bias correction approach we propose can be used in a 
more general setting. 

5 Full algorithm to suppress multiplicative noise 

Now, piecing together Lemma 1, Lemma 2 and Theorem 3, we arrive at the multiplicative noise removal 
algorithm: 

Task: Denoise an image S contaminated with multiplicative noise according to (2). 

Parameters: The observed noisy image /S, number of iterations TVdr (Douglas- Rachford outer iterations) 

and Nfb (Forward- Backward inner iterations), stepsizes /it ^ (0? 2), < l3t < 1/4 and 7 > 0, tight-frame 

transform W and initial threshold T (e.g. T = 2^/tlji{K))^ regularization parameters Ao,i associated to 

the sets /o,i- 

Specific operators: 
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(a) Ts^^'(z) = (^max{0, - 7AiSign(z[i])}) ^, Mz &^*' . 
{z[i,3] \i\z[i,j]\<l; 

(c) V and Div — the discrete versions of the continuous operators V and div. 

(d) V^i(-) defined according to (8) (built-in Matlab function, otherwise see [50]). 
Initialization: 

• Compute V = log 5 and transform coefficients y = Wv. Hard-threshold at T to get yr^. Choose 
an initial x^^\ 

Main iteration: 

For t = 1 to TVdr, 

(1) Inverse curvelet transform of x^'^^ according to u^'^^ = Wx^'^\ 

(2) Initialize For s = to TVpB - 1 

(3) Set z(^) =^(^^B). 

(4) Compute w^'^^ = c~-^7 Div(2;'^^^). 

(5) Forward curvelet transform: a^^^ = Ww^'^\ 

(6) Compute r*^^^ = rprox^<|,(x'^^^) = x^'^^ — 2a^'^\ 

(7) By (38) compute q^'^ = (rprox^^ o rprox^^) x^'^ = 2 (yr^[i] + Tg^^^ (r^^^H - ^Th W)) .^^ - r^'^ . 

(8) Update using (33): x^^+i) = [i - i^^j x^'^ + ^g(^) . 
End main iteration 

Output: Denoised image S = exp {Wx^^^^^^ (1 + V^i(i^)/2). 

Remark 4 (Computation load) The bulk of computation of our denoising algorithm is invested in 
applying W and its pseudo-inverse W. These operators are of course never constructed explicitly, rather 
they are implemented as fast implicit analysis and synthesis operators. Each application of or 
cost O(A^logiV) for the second generation curvelet transform of an A^-pixel image [15]. If we define 
TVdr and A^fb as the number of iterations in the Douglas- Rachford algorithm and the forward-backward 
sub-iteration, the computational complexity of the denoising algorithm is of order A^DR^FB2A^log 
operations. 
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6 Experiments 



In all experiments carried out in this paper, our algorithm was run using second-generation curvelet tight 
frame along with the following set of parameters: Vt, /it = 1, = 0.24, 7 = 10 and A/'dr = 50. The initial 



threshold T was set to 2^y^|Jl{K). For comparison purposes, some very recent multiplicative noise removal 
algorithms from the literature are considered: the AA algorithm [10] minimizing the criterion in (14), 
and the Stein-Block denoising method [21] in the curvelet domain, applied on the log transformed image. 
The latter is a sophisticated shrinkage-based denoiser that thresholds the coefficients by blocks rather 
than individually, and has been shown to be nearly minimax over a large class of images in presence of 
additive bounded noise (not necessarily Gaussian nor independent). We also tried the "naive" method, 
called L2-TV, where u minimizes (15) and the denoised image is given after bias correcion according 
to (52). No without surprise, one realizes that the results are quite good, even though some persistent 
outliers remain quite visible. This again raises the persistent question of relevance of PSNR (or even 
MAE) as a measure of perceptual restoration quality. For fair comparison, the hyperparameters for all 
competitors were tweaked to reach their best level of performance on each noisy realization. 

The denoising algorithms were tested on three images: Shepp-Logan phantom, Lena and Boat all of 
size 256 x 256 and with gray-scale in the range [1, 256]. For each image, a noisy observation is generated 
by multiplying the original image by a realization of noise according to the model in (2)- (3) with the 
choice /i = 1 and = 10. For a A/'-pixel noise-free image 5*0 and its denoised version by any algorithm 
5, the denoising performance is measured in terms of peak signal-to-noise ratio (PSNR) in decibels (dB) 

PSNR = 201ogio 



and mean absolute-deviation MAE 



MAE = 



S-So 



/N. 



1 

The results are depicted in Fig. 3, Fig. 4 and Fig. 5. Our denoiser clearly outperforms its competitors 
both visually and quantitatively as revealed by the PSNR and MAE values. The PSNR improvement 
brought by our approach is up to 4dB on the Shepp-Logan phantom, and is ~ IdB for Lena and Boat. 
Note also that a systematic behavior of AA algorithm is its tendency to lose some important details 
and the persistence of a low-frequency ghost as it can be seen on the error maps on the third row in 
Figs. 4 and 5. 



7 Conclusions 

This work proposes quite an original, efficient and fast method for multiplicative noise removal. The 
latter is a difficult problem that arises in various applications relevant to active imaging system, such as 
laser imaging, ultrasound imaging, SAR and many others. Multiplicative noise contamination involves 
inherent difficulties that severely restrict the main restoration algorithms. 
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The main ingredients of our method are: (1) consider the log-data to restore a log-image; (2) preprocess 
the log-fata using and under-optimal hard-thresholding of its tight frame coefficients; (3) restore the log- 
image using a hybrid criterion composed of an £^ data-fitting for the coefficients and a TV regularization 
in the log-image domain; (4) restore the sought-after image using an exponential transform along with 
a pertinent bias correction. The resultant algorithm is fast, its consistency and convergence are proved 
theoretically. 

The obtained numerical results are really encouraging since they outperform the most recent methods 
in this field. 
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(a) Shepp-Logan (256 x 256) (b) Noisy /i = 1, K = 10 




(c) Stein-block thresholding (d) Our method 




(e) (f) 

Figure 3: Performance comparison with Shepp-Logan phantom (256 x 256). (a) Original, (b) Noisy 
fi = 1, K = 10. (c) Denoised with Stein-block thresholding in the curvelet domain [21] PSNR=24.73dB, 
MAE=4. (d) Denoised with our algorithm PSNR=31.25dB, MAE=1.87. (e)-(f) Errors (restored - 
original) for (c)-(d). 
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(a) Lena (256 x 256)— original (b) Noisy: /i = 1, K = 10 (c) L2-TV 

PSNR=26.22 dB, MAE=8.5 




(d) Stein-block thresholding [21] (e) AA algorithm [10] (f) Our method 

PSNR=25.49 dB, MAE=9.45 psnr=25.37 dB, mae=9.41 psnr=26.05 dB, mae=8.8 




Figure 4: Comparative restoration of the noisy Lena in (b) using modern methods. Note that the 
algorithm in (c) is initialized with the log-data and that the restoration in (d) is done in the curvelet 
domain. The images on the last row show the error (restored — original) for (d), (e) and (f). 
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(a) Boat (256 x 256) — original 



(I)y INUiOJ. yLt/ JL, iV ±U 

see (2)-(3) 



(c) L2-TV 

PSNR=24.118dB, MAE=10.202 




(d) Stein-block thresholding [21] (e) AA algorithm [10] 

PSNR=23.57dB, MAE=10.98 PSNR=23.36dB, mae=11.08 



(f) Our method 
PSNR=24.12dB, MAE=10.2 




Figure 5: Restoration of (b) using contemporary methods. The last row shows the error images, namely 
(restored — original) for (d), (e) and (f). 
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